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ABSTRACT 

We present a rigorous mathematical solution to photometric redshift estimation and the more general 
inversion problem. The challenge we address is to meaningfully constrain unknown properties of 
astronomical sources based on given observables, usually multicolor photometry, with the help of a 
training set that provides an empirical relation between the measurements and the desired quantities. 
We establish a formalism that blurs the boundary between the traditional empirical and template- 
fitting algorithms, as both are just special cases that are discussed in detail to put them in context. 
The new approach enables the development of more sophisticated methods that go beyond the classic 
techniques to combine their advantages. We look at the directions for further improvement in the 
methodology, and examine the technical aspects of practical implementations. We show how training 
sets are to be constructed and used consistently for reliable estimation. 
Subject headings: galaxies: statistics — methods: statistical 
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1. MOTIVATION 

The concept of photometric redshift estimation is over 
four decades old. Since Baum (1962) the methodology 
has changed only incrementally but its role in astronomy 
has completely spun around. The astronomy commu- 
nity originally received the idea with serious skepticism, 
which, over ti me, thanks to a series of breakthro ughs in 
the field fe.g.. iKoolflQSSi : iConnollv et al.lll995ah . slowly 
faded. Today the next generation telescopes plan to per- 
form photometric observations only, and completely rely 
on these kind of estimation techniques for most of their 
key science projects including cosmology and large-scale 
structure. 

While getting ready for extracting most of our new sci- 
entific knowledge from photometric measurements, we 
have to examine the current limitations of the various 
techniques and understand the underlying assumptions. 
Essentially all currently existing implementation can be 
categorized into two classes of methods: empirical esti- 
mators and template fitting. Reviewing the history of 
the research area is o utside the scope this study; see 
IWevmann et al.l (|1999f ) for a rich cross section of the field 
instead; now we look at the basic concepts and the differ- 
ences in the traditional methodologies. Empirical meth- 
ods map the relation of the observed and desired proper- 
ties using a training set; e.g., piecewise linear or polyno- 
mial fitting, or via other regression methods like artificial 
neural nets, support vector machines, etc. Template- 
fitting techniques rely on prior knowledge encoded in the 
model's spectral energy distributions (SEDs) that can 
be matched to observations. Why are the current imple- 
mentations of these two so different? There is no fun- 
damental reason, e.g., one could generate training sets 
from model templates. Why do only template-fitting al- 
gorithms use photometric uncertainties and not the em- 
pirical ones? Why do people estimate the redshifts inde- 
pendently from other physical properties, e.g., often use 
empirical redshift estimates and then template spectra 
for type determination? We know these quantities are 
correlated and should be dealt with in a consistent way. 
The answers to these questions are usually direct con- 



sequences of limitations in the models and the measure- 
ments. If the model SEDs matched all the observations, 
we would know everything about all the objects in the 
Universe. The uncertainties would be used more often if 
they provided reliable extra information. 

The "Photo-Z" label currently associated with the 
above methods, should gain a new meaning. We should 
expect more from the codes than a single estimate per 
object. The implementations need to provide the full 
joint probability density functions of all desired physical 
parameters, so we can develop new statistical tools that 
utilize all the information available. 

In this paper, we are not concerned with what observ- 
ables are the best to use or which filter set is optimal 
for special cases of the generalized photometric inversion 
problem, which depend on the specific science cases, in- 
stead we derive a probabilistic formalism to address the 
common issues. In Section [21 we introduce the method- 
ology and derive the formulas for determining the pho- 
tometric constraints on physical properties. Section [3l 
describes the traditional empirical and template-fitting 
algorithms as special cases of the proposed framework, 
and the advanced techniques that go beyond their lim- 
its. In Section IH we illustrate the concepts and detail 
the practical aspects. Section [5] concludes our study. 

Throughout the paper, we use the capital P letter for 
probabilities and the lower-case p letter for probability 
density functions, or PDFs for short. 

2. METHODOLOGY 

We start by formulating the problem as general as pos- 
sible. The challenge is to constrain physical properties of 
sources with some observables in a data set denoted by 
Q, hereafter the query set. Since model spectra would 
never be perfectly suitable for all desired parameters, one 
will need a training set, T. In fact there is no reason to 
demand that these data sets have the same observables. 
The mapping is provided by some model, M . For exam- 
ple, magnitudes of different photometric systems can be 
mappe d on to one another, say, UJFN observations to 
ugriz (jFukugita et al.l [19951 ) by empirical formulas. In 
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general, let a; be a set of observables in the training set 
T that also contains extra information about the physi- 
cal properties ^, and let y denote the observables of the 
query set Q. Our model is parameterized by a vector 9; 

T '■ {Xt, ^t\t^T 

M : 

The model M can predict the observables x and y for 
a given parameter via the density p{x, M) and has 
a prior on its parameters p{9\A d). For exampl e, one can 
build mo dels based on the Coleman. Wu fc W ccdman 
(|1980( 1 or iBruzual fc CharlotI (|2003f l templates that can 
be used to calculate the colors of sources at a given red- 
shift in any particular photometric system. However, the 
modeling goes beyond just estimating the values for a 
given parameter, because the observational uncertainties 
also enter the formula. Later on, we will discuss in de- 
tails how to establish various models; for now, the above 
functions are assumed to be known. Furthermore, let us 
assume that the training set samples the entire space of 
the observables, and discuss the selection effects later. 

Our goal is to derive the probability density function 
(PDF) of the physical properties ^ for a given query point 
q with observations using our model M. This func- 
tion, M), is the solution of the generalized photo- 
metric inversion problem and the subject of this section. 
The next two paragraphs discuss probabilistic concepts 
analogous to elements of template fitting and empirical 
estimation, respectively, in the context of our probabilis- 
tic formalism. Next we address the burning issues of 
selection effects and feasibility. 

2.1. Mapping the Observables 

The first step is to make the connection between the 
observables. It can be done formally by calculating the 
probability density of x for the query point q. We do 
this via the equality of 

where the right-hand side contains integrals of known 
functions over the model's parameter domain 

p{x,y^\M) = Jd6pie\M)p{x,y^\e,M) (2) 

and over x for the marginalization 



P(.yq\M) ^ Jdx p{x,yg 



M) 



(3) 



We see how this is superior to the techniques analogous 
to the traditional way. The usual solution involves fit- 
ting for the best-match model parameter using, for exam- 
ple, maximum likelihood estimation (MLE), and accept- 
ing that parameter at face value to derive the estimates. 
Here, we consider all possible model parameters and add 
up their contributions. 

We note that the above general mapping formula is 
valid in case of improper priors, too, in the sense that 
the posterior is always properly normalized to unity. If 
one has no prior knowledge about the model parame- 
ters, and wishes to use a noninformative prior, e.g., flat 
p{0\M) — 1, formally he/she is allowed to do so; see more 
on the priors later on. 



2.2. Physical Properties 

Next we establish the relation between the observable 
and the desired physical parameters. The traditional way 
is to assume the properties of interest to be a function 
of the observables. Some of the existing methods utilize 
explicit functions such as a polynomial or piecewise lin- 
ear, while others use more obscure mappings such as a 
decision tree or an artificial neural net. Conceptually, 
they are just assuming a fitting function 



(4) 



which is tuned to reproduce the elements of the training 
set as best as possible. The problem with this assump- 
tion is that there is no guarantee that the same x ob- 
servables always correspond to the same ^ properties. In 
fact, we know that degeneracies are present in most data 
sets. Clearly, the above assumption is an unnecessary 
restriction over the general relation of x and ^ denoted 
by p{£,\x). In other words, the traditional model is 



P{i\x)=5{\i-i{x)\) 



(5) 



using Dirac's 5 symbol. 

A better way is not to restrict the distribution arbi- 
trarily to an unknown surface but to leave the formula 
general. We can establish the proper relation by observ- 
ing the fact that 



p{$.\x) 



p{x) 



(6) 



The right-hand side is a ratio of two densities that 
(both) can be estimated from the training set, e.g., using 
Voronoi tessellation or kernel density estimation (KDE) . 

Having derived the above relation, one can compute 
the final PDF of interest as the integral over the possible 
observables in the training set 



(7) 



pii\yg,M)^ dxp{i\x)p{x\yg,M) 



When it is possible to accurately characterize this distri- 
bution by a Gaussian function or some mixture model, 
one can compress the numerical results into a few param- 
eters. When the PDF is unimodal, which is often not the 
case, the expectation value should suffice for an estimate 



(8) 



d^ip{^\y^,M) 



The above equ ation is similar to kernel regression 
(|Nadaravalll964l ) in case of using KDE, except it is a gen- 
eralization to incorporate the uncertainties in the data 
sets. 

Photometric redshifts and other such properties are of- 
ten used in statistical studies for their availability for a 
large number of sources, even though they provide rel- 
atively loose constraints on individual objects. The full 
PDFs of the sources are best suited to derive the en- 
semble properties of entire catalogs or even specific sub- 
samples. The distribution of the properties over a set of 
measurements Q is given by the average 

Hence there is no need for an extra deconvolution step 
to recover the underlying distribution of the objects in 
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a sample, because their average PDF is exactly that. A 
common example is the estimation of the redshift dis- 
tribution dN/dz for various subsamples, say, at differ- 
ent distances. When selection bias is not an issue for 
the scientific analysis, e.g., lensing studies, one can even 
choose the subsets to optimize the contrast of the aver- 
aged PDFs. 

2.3. Selection Effects 

The inherent limitations of a finite training set pose 
a serious problem for any estimator, which is often ne- 
glected. Our formalism introduced earlier is no excep- 
tion, hence we now turn to examine the effects. The 
selection function is the probability of a source, with ob- 
servables x making it into the training set, P{T\x). The 
region that the training set can sample is the window 
function P{W\x), which takes the value of 1 where the 
selection function is nonzero, and otherwise. For ex- 
ample, 

' 1 if V[x) < 22 
otherwise 



P{W\x) 



(10) 



for a survey that has a magnitude limit of 22 in a V band. 

The selection function is expected to enter our method 
at two separate places: the marginalization over x and 
via the density estimates used for the relation p{^\x). 
The former appears to be inevitable but causes problems 
only at the boundaries of the selection criteria. If the 
integrand p(a;|y^, AI) in equation ^ vanishes within the 
integration domain of the window function P{W\x), the 
results are valid. Otherwise the estimated PDF is biased 
in an unknown way. The probability of q being inside 
the window function is the right indicator of the problem 
occurring 



(11) 



P{W\yg,M)^ dx P{W\x)p{x\y^,M) 



When this probability is close to 1, the training set pro- 
vides good support for the photometric inversion prob- 
lem, but when the value is low, the query point is known 
to be outside the regime of the training set. 

The relation between the desired properties and the 
observables is the other issue as it is only probed on the 
training set. The relation as seen on the training set 
depends on the true relation and the selection function 
via the equation 



p{i\x)P{T\x,^) 
P{T\x) 



(12) 



If the selection function strictly depends only on x, we 
have P(T|a;,^) — P{T\x) and find that the empirical re- 
lation is identical to the true one on the selection domain. 
If the sampling frequency is low, the measured relation 
is noisier and less robust numerically. 

This is a critical point, which is worth emphasizing 
once again: the p(^|a;,T) — p{(,\x) equality holds only if 
^ does not influence the selection in any way, not even in- 
directly via some hidden parameter. A counter example 
is the common case of cutting on morphological parame- 
ters in the selection function, while only considering the 
fluxes for x. Another interesting consequence is that one 
cannot use only the colors to estimate, say, photometric 
redshifts, if a magnitude cut was involved in the selection 



of the training set. Yet another issue is cosmic variance, 
which might cause the relation to depend on the posi- 
tion in the sky. The solution in all cases is to revise the 
selection of the training set, if possible, or to add the hid- 
den observables into a;, and extend the model to include 
them. 

3. MODELS IN THE TRADITIONAL LIMITS AND BEYOND 

Previously we have hinted at how models can be con- 
structed but, until now, they have just been assumed to 
be known. A model is a combination of the limitations 
in our observations, both in the training and query sets, 
and the parameterization of the observables. From dis- 
cussing the topic in the most general way, we now turn to 
the practicalities of real-life astronomical observations. 

Today the errors of extracted fluxes of photometric 
measurements are independent estimates of the uncer- 
tainties in the separate passbands. Typically, Gaussian 
errors are assumed, and the catalogs would quote Icr val- 
ues for every source. Analyzing the repeated observa- 
tions i n the Sloan Dig it al Sk y Survey (SDSS; York et al] 
l2000f ) , IScranton et al.l (|2005D have shown that this sim- 
ple picture is wrong, and the off-diagonal elements of the 
covariance matrix are significant. This is not surprising. 
One of the major components in the photometric uncer- 
tainty is the error in the determination of the aperture. 
If the multicolor measurements share a common aper- 
ture, e.g., SDSS model magnitudes that are best suited 
for colors, the fiux measurements will be inevitably cor- 
related. Thus an improved error model of the photomet- 
ric observations is described by a multivariate normal 
distribution, iV(a;|S, C^;), with a mean of x and covari- 
ance matrix Cx- The next generation survey telescopes 
that plan to visit the sources on multiple occasions will 
be able to better determine the full covariance matrices 
from actual observations to improve our understanding 
of the errors. Hence, for now it is general enough to 
consider error estimates that are fully described by the 
covariances. 

In this reasonable approximation, the p{x,y\9,Ad) 
mapping is also a normal distribution with a full covari- 
ance matrix that includes cross-catalog terms, if neces- 
sary, that go beyond the calibration work on the indi- 
vidual catalogs. If the apertures are locked together for 
better color determination, one has to obtain the de- 
pendencies via a data set that contains sources with all 
X and y measurements. However, when the processing 
pipelines are independent, one can assume that the un- 
certainties in X and y are also independent, and write a 
realistic Af as the product of the two Gaussians: 

pix,y\e,M) = Nxix\xie),Cx{e)) 

xNy{y\y{e),Cy{e)) (13) 

The dependences in the means x{6) and y{8) are 
straightforward to model and, even in the most com- 
plicated case, are similar, in spirit, to the traditional 
template-fitting procedures. For example, when consid- 
ering a synthetic model of galaxies, one has to vary the 
redshift, age, optical depth, and so on, to derive high- 
resolution model spectra for different parameters, and 
then convolve them with the broadband filters to get the 
fluxes. 

Clearly modeling the covariance matrices is more com- 
plicated and would require many more parameters to 
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model accurately. If is a minimal set of parameters 
that is enough to describe x{9) and y{0), there are some 
other hidden parameters or hyperparameters that are 
also needed for the covariances. The fully Bayesian way 
is to establish the relation of the covariance matrix and 
the hyperparameters along with a hyperprior (the prior 
on the hyperparameters), and to marginalize over the ex- 
tra dependence. Even though, this relation between the 
elements of the covariance matrix and the observables 
could, in principle, be modeled based on the catalogs, it 
may prove impractical. The empirical Bayes approach, 
admittedly more optimistic but easily quantifiable, is to 
find the most likely hyperparameter and substitute it into 
the dependence. In practice, for every parameter 6, one 
can find the values of x{9) and y{6) and the closest mea- 
surement points, whose covariance matrices are good es- 
timates. If the covariance matrix changes slowly with 
X compared to its widths, one can safely calculate the 
values at the catalog points by using the corresponding 
error matrices, 

pixt,yg\e,M)^N,{xt\x{e),Ct) 

xNy{y^\y{9),Cg) (14) 

The only concern with this approximation is the noise 
on the elements of the covariance. If needed, one could 
improve on the stability by smoothing or fitting locally 
over the catalog entries. 

The consequences of the model approximation in equa- 
tion (I14p are most intriguing from the implementation 
aspect of the methodology. As long as we only evalu- 
ate the PDFs at the observed locations, the calculations 
are more straightforward and computationally less ex- 
pensive. 

3.1. Numerical Evaluation 

The field of numerical evaluation of complicated mul- 
tidimensional integrals that usually emerge in Bayesian 
analysis such as ours is well studied. The solution typ- 
ically involves some randomized algorithms that range 
from simple direct sampling from the prior to adap- 
tive strategies often based on Markov chain Monte Carlo 
(MCMC) methods, e.g., Gibbs sampling. Although this 
topic is beyond the scope of the present discussion, we 
briefly touch on the basic idea to illustrate the concepts 
and provide some insight on how to derive the final re- 
sults numerically, namely the value of P{W\yg,M) and 
the function p(^|y^, M). 

The clever construction of the chain in the MCMC al- 
gorithm yields model parameters {9i} that can be con- 
sidered independent random realization drawn from the 
posterior distribution, p{6\y M) in our case. With the 
chain in hand, one can readily approximate the integral 
by the average over the MCMC samples. The mapping 
of the observables then becomes 

p{x\yg,M)^{N.,{x\xie,),Ct)) (15) 

where t is the index of the training point Xt closest to 
x{Oi). When the query point is well within the regime 
of the training set, this approximation is valid. What 
happens otherwise? Often the uncertainties are larger 
outside the selection criteria, e.g., the photometric er- 
rors beyond the flux limit. By using the covariance 
matrix of the closest training point, one actually arti- 
ficially decreases the contribution to the integral making 



p{x\yq, M) tighter. While the accuracy of the calculation 
is affected, the change is such that it reduces the value 
of the integral in P(VF|y^,M), which is the measure of 
reliability. Hence, if we measure a large value, we can be 
confident of the result. Having said that we note that 
in practice the covariances probably do not change fast 
enough to pose a significant problem in this calculation 
for the objects along the edge of the selection function, 
and farther away the probabilities are very small anyway. 

Once we know that the estimation is in the safe regime, 
we can compute the p{$,\yq^ M) integral ignoring the win- 
dow function completely by summing up at preset 
points in our region of interest, e.g., a fine redshift grid, 
as 

p(C|y„r,M) cx^p(|,|a:„r)^^^f^ii^ (i6) 

^ p{xt\T) 

where the p{xt\T) densities and the matrix p{^j.\xt,T) 
are obtained from the numerical density estimates once 
for the training set; see equation ([6|). Here, we made use 
of the fact that the {xt} points are (naturally) drawn 
from the distribution p{x\T). 

In order to perform these summations efficiently for 
many query points, one has to utilize fast searching mech- 
anisms in the space of the observables. The situation is 
complicated by the strong correlation in the observables 
and the varying Mahalanobis metric, yet, a significant 
speedup can be achieved by adequate mul tidimensional 
indexi ng of the color-space as described in ICsabai et al.l 
(|2007l) . 

3.2. Template Fitting 

In classical SED-fitting approaches, one does not tech- 
nically have a training set. Although, formally it can 
be generated from a grid of model parameters {9t} as 
{xt,^t} = {x{et),i{6t)}, where l{9) is often simply a 
subset of 6, e.g., the redshift is just one of the parame- 
ters in the models of SEDs. Traditionally, this artificial 
training set has no errors associated with the reference 
points, hence we have 

p{x\e,M) = s{\x-x{e)\) (17) 

and, assuming x{0) has an inverse, 

p{i\xt,M) = S{\^-^t\) (18) 

The analytical calculation yields an intuitive result, 
where the grid points are weighted by their likelihood 
multiplied by the corresponding prior 

p(^|y,, M) « J2 S{\$-^t\)piOt\M) N{yq\y{dt),C,) 
teT 

(19) 

In the limit of a flat prior, this is the classic MLE case, 
which is equivalent to the minimization techniques 
used in most SED-fitting implementations today, where 
the measurements are compared to the simulated obser- 
vations at the grid points to select the optimum. One ob- 
vious exception is the algorithm of lBenitei ()200CI[ ). which 
actually applies an explicit empirical redshift prior in this 
equation, hence it is often referred to as "Bayesian." 

The selection of a set of templates is another simple 
prior but on the spectral type, even if well hidden, im- 
plicit, and not often admitted. Researchers routinely 
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seek for templates that provide the best redshift esti- 
mates. Strictly speaking, this is cheating. The selection 
should be based on how well the templates represent the 
data in the space of the observables, and not based on 
their performance in the estimation. Naturally, there 
is a connection, but not in that direction. The tem- 
plates that follow the data will likely provide better esti- 
mates; however, templates that yield good estimates are 
not guaranteed to match the data. The development 
of a class of methods b y iBudavari et all (| 19991 . |2000| . 
[200I and iCsabai et al.l (|2000D can be considered early 
attempts to achieve a better SED prior. Here, the tem- 
plates are statistically modified to represent the obser- 
vations more accurately, while not optimized for redshift 
estimation whatsoever. Clearly, these are just the first 
steps in this direction. Instead of just assigning 1 and 
weights to the templates by either including them or 
not (respectively) as typically done today, one can explic- 
itly formalize more realistic priors over a broader range 
of SEDs that are driven by scientific knowledge and/or 
ensemble statistics. 

An obvious but rather important improvement in the 
new framework is the ability to naturally introduce and 
utilize the uncertainties of the template spectra. We 
know that the models are not perfect, and this can be eas- 
ily characterized. As an example, one can use the same 
prescription for the spectral synthesis, but build on a 
various stellar libraries to analyze the differences. When 
using empirical templates, the implementation is even 
more evident. We fold in the uncertainties by abandon- 
ing the simplified relation in equation (|17p and creating 
a more realistic model with the estimated finite errors. 

3.3. Empirical Method 

The new methodology in the limit of the classic em- 
pirical algorithms goes well beyond the usual techniques, 
which consist of simply establishing the fitting function 
in equation ([4]). We can utilize those fits (or preferably 
estimate the densities numerically to map the full rela- 
tion), but we can also properly consider the uncertainties. 

The parameterization of a minimalist model is done 
by a position in the space of the observables, i.e., 6 is 
the same type of quantity as x and y, e.g., UBVI fluxes. 
Namely, we choose x{0) = 6 and y{8) — 9. Even though 
the observables in x and y are the same quantities, the 
mapping is still required to fold in the photometric errors. 
With an improper flat prior p{6\M) = 1, the mapping of 
the observables is integrated analytically 

p{xt\y^,M) = Jde N{xt\e,Ct)N{e\y^,Cg) (20) 

While this model is clearly very simple, it is quite pow- 
erful and conceptually more sound than a number of tra- 
ditional methods. Wc will use it for illustrations in the 
upcoming discussions. 

Other simple forms of priors can also be handled ana- 
lytically, e.g., linear and Gaussian, that may be reason- 
able approximations at least locally. Otherwise we resort 
to the numerical evaluation. 

3.4. Advanced Methods of the Future 

The problem with the classic empirical methods is the 
requirement of having the same set of observables for 



both the training and the query sets. The limitations of 
the SED-fitting techniques come from the fact that the 
models cannot perfectly describe the relation of observ- 
ables and the physical properties. 

In the realm of our unified framework, we can have 
more advanced methods that combine these two previ- 
ously separate classes of techniques. We can introduce 
new algorithms to take advantage of the training points 
even if their photometric observables differ from those 
in the query set. The idea is the following: True to the 
spirit of empirical methods, we utilize the training set to 
provide the relation between the physical properties we 
wish to constrain and some observables; see equation ([6]). 
In addition to this empirical relation, we apply a map- 
ping from the observables of the query set to that of the 
training set based on SED modeling, like in the tem- 
plate fitting procedures. For example, if the training set 
contains UJFN magnitudes, one can map them to ugriz 
using equation ([T]). 

The intriguing observation to make here is that one 
does not even need realistic physical models to start with, 
because the physics is in the training set and not the 
model. Let us consider a model M, which is a complete 
basis on the observed wavelength range, e.g., Legendre 
polynomials or Fourier series with the parameterization 
by their coefficients. The manifold of the physical spec- 
tra is naturally contained within. In practice, this model 
needs to be only sufficiently complete and band-limited 
so that real SEDs can be well described; this is a weak 
prior that we can set up based on all the spectra we 
observed and simulated before. A model spectrum cor- 
responding to a certain parameter value in M can be 
convolved with the appropriate transmission curves to 
yield the observables x{0) and y{9), even if they are un- 
physical. Hence, formally we have the basis of our map- 
ping, p{x,y\0, M). As long as the data provide good 
enough constraints on the model parameters, the map- 
ping is valid and the algorithm follows the routine. When 
the observations barely constrain the model parameters 
and large volumes of unphysical SEDs have significant 
likelihood, the mapping will be wrong. The solution is 
to apply a prior to consider only the physical SEDs. Us- 
ing the entire catalog, one can derive an empirical phys- 
ical prior statistically, which we will discuss in the next 
section. 

These new advanced methods overcome the usual dif- 
ficulties in photometric redshift estimation, and offer a 
way out of the half-century-old dilemma. They are a 
natural extension of everything that has worked before 
in the field: a straightforward combination of the two 
previously separate methodologies. 

4. DISCUSSION 

Next we demonstrate the new framework in action by 
applying a simple model to real-life data, which is fol- 
lowed by discussions of the qualities of training sets and 
the prior. 

4.1. A Case Study 

To illustrate the concepts introduced earlier, we ap- 
ply the aforementioned minimalist empirical model to 
a sample of galaxies. We choose SDSS sources for 
their well-studi e d pho tometric uncertainties. Following 
IScranton et al.l ()2005[ ). we estimate the full covariance 
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Fig. 1. — The probability density as a function of the redshift for early- and late-type galaxies {upper and lower panels, respectively) at 
different distances marked by the vertical dashed lines. For every object, the dotted line shows the empirical relation of p{z\x = rrtq), and 
the solid line illustrates the final result of p{z\y = mq, M) after properly folding in the photometric uncertainties via the mapping in the 
model. 



matrix for all objects, and utilize them in the subse- 
quent analysis. We randomly sel ect a quarter of the en- 
tire Main Galaxy Sample (MGS: IStrauss et al.l[200l ) of 
DR6 to be the training set, roughly 100 thousand ob- 
jects. Our query set is a smaller disjoint random subset 
for illustration purposes. First, we map the observables 
(magnitudes to magnitudes) analytically using our sim- 
ple model in equation (PD|) . The calculation is done inside 
the DR6 database by SQL User-Defined Functions. 

Next, we compute the conditional PDFs b y a dual-tree 
KDE implementation (i Grav fc Mooril2003l: ' lLee fc Gravl 
|2006[ ) at preset locations defined by the T training and 
Q query sets in magnitude space and a uniform high- 
resolution redshift grid. The practical complication 
with any density estimation is the fact that it is scale- 
dependent and changes with the metric. We are further 
limited in our applications to fix bandwidths for the con- 
ditional density estimation in the current implementation 
of the estimator. We adopt a bandwidth oi h = 0.004 in 
a metric that scales the magnitudes to the redshift. In 
other words, the resolution in redshift space is set by h, 
the full width half maximum of the normal distributions, 
and we re-scale the magnitudes by a factor of / = 0.08 to 
reasonably match the density of the sources in the sep- 
arate subspaces. This simple technique is expected per- 
form reasonably well within the regime where the sources 
are suitably dense but not in the outskirts where a larger 
variable bandwidth is needed in magnitude space. The 
theory of more sophistic ated conditional density estima- 
tion is well-studied (e.g.. lFan et al.llT99^ . and advanced 
adaptive implementations are in the works to help out 
(Lee & Gray, 2008; private communication). 

Figure [1] illustrates the nature of the x — ^ rela- 
tion, in this case the multicolor measurements and red- 
shift p{z\mg), as well as the final redshift distribution. 



p(z\mq, M), incorporating the photometric uncertainties 
in our model. We see that the redshift is really not a sim- 
ple function of the magnitudes but rather a more general 
relation. This is even more so for observables that con- 
strain the physical properties less than the ugriz mea- 
surements. The relation itself (shown as a dotted line) 
might provide an overly optimistic view of the uncer- 
tainties at times and usually much noisier than the final 
PDF (shown in solid) that sums up these relations with 
appropriate weights. The top panels show intrinsically 
red galaxies at three different redshifts, which were se- 
lected based on the mixing angle of the first two principal 
components, also known as the eClass in the SDSS ter- 
minology. The bottom panels show the more problem- 
atic blue galaxies at similar redshifts. Note the consis- 
tent performance of the estimator on the red sources as a 
function redshift in comparison to the blue galaxies that 
have broader PDFs at higher redshifts and are noisier, 
especially at the largest distances. 

In the bottom rightmost panel, the distribution is 
not even centered around the spectroscopic redshift, but 
skewed toward lower values. This object is very close to 
the edge of the training set, and the result would be con- 
sidered unreliable due to the lack of calibrators at higher 
redshift that would still be within the sources photomet- 
ric uncertainties. 

4.2. Sampling Frequency 

A very attractive feature of the new PDF estimator 
derived earlier in equation ([8]) is its conceptual indepen- 
dence from the sampling of the calibrators. Many statis- 
tical tools rely heavily on having a representative training 
set and only provide unbiased results in that limit. In our 
case, the training points simply provide locations where 
the evaluation is feasible and their density is essentially 
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Fig. 2. — Results obtained from the stratified training set look much like the those from the full sample, which shows that the 
representativeness does not matter; instead the training sets should be optimized for the broad coverage of the observable volume with 
highest sampling rates in the outskirts. 



just a resolution factor. The sampling frequency of the 
training set only affects the accuracy of the numerical 
integral in equation but not in a systematic way as 
long as the query point is well within the boundaries of 
the window function. A denser training set will provide 
higher resolution in the summation, but there is a practi- 
cal limit beyond which one expects no improvement. The 
reason is that the new calibrator sources are essentially 
identical to the ones already in the training set. 

The number of spectroscopic measurements to be car- 
ried out for calibration purposes is limited by finite re- 
sources. It is vital to acquire reliable training sets for the 
new generation photometric studies. A good training set 
has a well-defined selection function, using criteria based 
on only the observables one plans to model for the es- 
timation, and, within that, a smart adaptive sampling 
strategy to optimize the coverage in observable space. 
Clearly, the densest regions can be subsampled, but one 
needs all training points in the outskirts of the manifold 
for broad support. For this reason, the simplest random 
subsampling of the underlying population will not suffice. 
Instead, a stratified sampling strategy is to be pursued. 

To demonstrate that the methodology is robust to this 
kind of systematic changes in the training set, we create 
a stratified subset and perform the previous analysis the 
same way. The sampling is done by including sources 
randomly based on their local density p{x\T) in magni- 
tude space. A galaxy is included in the training set only 
if the ratio of some constant po and the local density is 
larger than a randomly generated real number, Uqi, uni- 
form between and 1, i.e., po/p{x\T) > /7oi. We set the 
value of po to yield a subsample that is half the size of the 
original data set. Figure [5] shows the results for the pre- 
viously selected sources based on the smaller stratified 
subset. The basic shape of the curves is practically the 



same in most cases, only somewhat noisier but without 
systematics. One exception is the blue galaxy at around 
z = 0.1, where the subsampling somewhat amplifies the 
effect of the large wall in SDSS at z = 0.08. The blue 
galaxy at the highest redshift is essentially unchanged 
(except for the part at the lowest redshift where the den- 
sity in magnitude space is larger to start with) because 
the stratified sampling (by construction) has no effect on 
its already very sparse neighborhood. 

Optimal sampling is difficult to achieve. In fact, it is 
difficult even to define. In addition to the photometric 
uncertainties, the desired resolution of the physical quan- 
tities also sets limits on the sampling frequency. This is 
prominent in the case of degenerate regions where an ex- 
tended part of the physical parameter space is cramped 
into a small volume of observables. Simulations built on 
realistic SED models can help cross-check these factors, 
and evaluate the performance of the estimator ahead 
of time. In the ideal case, one would create stratified 
training sets in the space of the physical parameters in- 
stead of the observables, which should be more feasible 
in the near future with im proved spectral modeling (e.g., 
iCharlot fc Bruzuall 120091) . 



4.3. Empirical Priors 

The distribution of sources in a training set may be 
artificial and, as we just argued, should be optimized for 
coverage with a practical upper bound on the density 
tuned to the photometric inaccuracies and source diver- 
sity. However, the distribution in the query set is often 
physical and can be used to derive an empirical prior 
for our model. The basic observation is that the density 
of sources in the query set, p{y\Q)^ should match the 
predicted density of the model, p{y\Ai). The latter is 
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calculated for any prior as the convolution, 

p{y\M)^ Jdep{y\e,M)p{e\M) (21) 

If we substitute p{y\Q) measured from the sources on the 
left-hand side of the equation, the only unknown is the 
prior, which we ca n solve for u s ing th e ele gant deconv o- 
lution technique of iRichardsonI ()1972[ ) and lLucvl (|1974f ). 

To see why a physically sensible prior is important, 
let us consider the density of sources in the training set 
within the window function. Since the density is propor- 
tional to the product of the underlying p{x) density and 
the selection function, 

p{x\T) (X p{x) P{T\x) (22) 

a significant volume of the window function is not sam- 
pled by the training set, where p{x) is zero. Without 
a reasonable prior, the mapping p(a;|i/^, ill) could yield 
wrong weights for unphysical observables in the summa- 
tion of equation ([7]). Hence, any model needs some physi- 
cal information. Even if one is hesitant to take the empir- 
ical prior at face value, the domain of the model param- 
eters should be carefully considered. In case of template 
fitting, this happens implicitly, even if not optimally, via 
the selection of the set or manifold of template spectra, 
but can be also done for even the empirical algorithms. 

5. CONCLUSIONS 

Starting from first principles of Bayesian probability 
theory, we built a description and obtained the solution 
of the generic photometric inversion problem, where the 
physical properties of sources are constrained based on 
observational measurements. The new approach yields a 
formalism that encapsulates the field of photometric red- 
shift estimation, and contains the traditional methods as 
special cases. In our systematic analysis of the mathe- 
matical problem, we put previous techniques in context 
and pointed out the directions for improvement in each. 

The proposed extensions to the current methods rep- 
resent significant progress in more respects. We avoid 
the common assumption of the physical properties being 
a single-valued function of the observables by treating 
their relation in a more general way. Thus the formalism 
is not prone to fail in regions, where the data sets are 
degenerate. We showed how to estimate the correspond- 
ing probability density of this relation. In addition, the 
uncertainties of the observables are propagated all the 
way to the results via explicit modeling of the accura- 
cies. We discussed various aspects of the modeling from 
the simplest empirical case to the application of SEDs. 

This general framework allows for the construction 
of novel, more advanced methods that combine the at- 
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tractive qualities of empirical and template-fitting algo- 
rithms. One can build empirical estimators based on 
training sets that have different observables from the 
query set, e.g., UJFN photometry to ugriz, via SED 
modeling. We can improve the methods by creating 
more and more realistic models that include, for example, 
the strengths of t he emission lines in galaxies (following 
Gvorv et a l. '2009^ and their inclination angles (based on 
Yip et all 12009) among the model parameters to prop- 
erly marginalize over the nuance parameters for a more 
reliable mapping of the observables. 

The current limitations come from the lack of good 
understanding of the photometric uncertainties. From 
previous studies, we know that the flux measurements 
in various passbands are correlated, yet, most catalogs 
only quote errors on the individual fluxes. For more 
precise scientific measurements via tighter photometric 
constraints, we need better photometric error models in 
the future. Upcoming survey telescopes will observe all 
sources multiple times, hence will be able to get a bet- 
ter handle on the errors and their covariances. Under- 
stadning these systematics is probably one of the highest 
priority tasks in the preparation for the upcoming era of 
photometric science. 

The proper solution of the generalized photometric in- 
version problem may be straightforward on paper, but ef- 
ficient implementations of realistic models with appropri- 
ate priors involve many advanced concepts in statistics, 
and can only be built on the most recent and on-going de- 
velopments in computer science, e.g., multi-dimensional 
indexing in databases. Even then the computations are 
not trivial to carry out, and have significantly higher de- 
mand for compute power than previous methods. The 
immediate future work is to have such a unified frame- 
work developed and ready for the next generation imag- 
ing surveys. 
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